Skip to content
This repository was archived by the owner on Aug 19, 2019. It is now read-only.

Conversation

StevenYCChou
Copy link

Add interface inside api_server, so we expose instrumentation metrics later on (currently working on it).

Some thoughts:

  1. I added a private function MetadataApiServer::SerializeMetricsToPrometheusTextFormat() so we can test the functionality without considering http_connection, which isn't tested in current test suites.
  2. I only built prometheus-cpp core library with gzip compression disabled for 2 reason: 1) gzip is not used in our http server, 2) gzip compression requires system library unless we statically compile and link with gzip library.

Copy link
Contributor

@davidbtucker davidbtucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StevenYCChou
Copy link
Author

I noticed we use a lot of space indentation in Makefile, though I think only tab indentation should be used in Makefile. Will use separate PR to change that indentation.

Copy link
Member

@igorpeshansky igorpeshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add at least one actual metric to this PR? I understand that it'll make the PR larger, but it would also make it easier to see what the abstractions are and how we plan to track metrics in the client code. Also note that it's not easy to understand what "exposition" means in this context.

conn->write(response);
}

std::string MetadataApiServer::SerializeMetricsToPrometheusTextFormat() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest creating a Telemetry class to wrap this whole Collectable business and the registry and text serialization. Then you could have a field telemetry_ of type const Telemetry&, and this method can be replaced with telemetry_.MetricsToString() (or even telemetry_.ToString()).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation, it is implemented as function inside google::internal namespace, https://github.com/Stackdriver/metadata-agent/pull/210/files#diff-bcca2b17db5d9ee49a78774f858b95e4R25 . Two reasons:

  1. It can be implemented as a function, instead of a class
  2. implement the function inside google::internal namespace enables us to test this function, but does not need to keep an member telemetry_ inside the MetadataApiServer class

@StevenYCChou StevenYCChou changed the title Add instrumentation exposition into api server RFC: Add instrumentation exposition into api server Nov 22, 2018
@StevenYCChou StevenYCChou changed the title RFC: Add instrumentation exposition into api server RFC: Add instrumentation and expose measurement in prometheus format Dec 6, 2018
@StevenYCChou
Copy link
Author

@g-easy can you take a brief look on this PR? I want to get your feedback on

  • integrating opencensus-cpp library with building system (CMake and make)
  • code structure on using opencensus-cpp
  • application testing strategy with opencensus-cpp

As we add more metrics, some of the metrics are other objects -
e.g. KubernetesUpdater. View registration should happen after all the
metrics are accessed. The lifecycle of all the measure are registered
outside of MetadataAgent, so move view registration to the last step
in main function.
Copy link

@g-easy g-easy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build system stuff looks reasonable.
Test looks great.
Couple of comments on initialization.


namespace {

::prometheus::TextSerializer text_serializer;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are text_serializer and exporter trivially constructible? Maybe make them both static pointers inside SerializeMetricsToPrometheusTextFormat?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even create them every time SerializeMetricsToPrometheusTextFormat is called.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now creating objects every time when SerializeMetricsToPrometheusTextFormat is called, because I cannot make sure whether my implementation on static pointer is the right approach, so eventually switch to object approach.

As for now, SerializeMetricsToPrometheusTextFormat() will only be called when server is called on /metrics specifically, so I don't think it will have huge performance impact and don't want to optimize early without my solid knowledge on static pointer approach.

@StevenYCChou StevenYCChou changed the title RFC: Add instrumentation and expose measurement in prometheus format Add instrumentation and expose measurement in prometheus format Dec 7, 2018
@StevenYCChou
Copy link
Author

change from RFC to normal one, because this one has tests included. high level review is still welcome first so we can get feedback loop soon.

Copy link
Member

@igorpeshansky igorpeshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comments. I did not look at the Makefile changes yet — those will take more effort to review.

[submodule "lib/prometheus-cpp"]
path = lib/prometheus-cpp
url = https://github.com/jupp0r/prometheus-cpp
[submodule "lib/opencensus-cpp"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always wary of targeting an untagged and unreleased commit. We can fix this later, but just wanted to note this as a concern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acknowledged. I just follow the pattern we used on other submodules. Once opencensus-cpp offically release their new version, we should check in that specific version. for now, I targeted on a commit which includes all the functionality we need, including prometheus-exporter.


namespace {

::prometheus::TextSerializer text_serializer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even create them every time SerializeMetricsToPrometheusTextFormat is called.


// View Descriptor accessors. If the view descriptor variable is not
// initialized, these methods will initialize the variable.
static const ::opencensus::stats::ViewDescriptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be private?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. I tried to implement GetGceApiRequestErrorsCumulativeViewIntData, but it the data cannot propogated to this view because 1. the view needs to be created before the metric is recorded, and 2. for testing, we need to call TestUtils::Flush() to propagate to view. If this functino is private, it cannot be used in oauth2_unittest. We may put it as private function and make a class in oauth2_unittest as friend class, but if we add more metrics across files, you will friend a couple more - so I am not sure that's what we want.

In order to create a view from the view descriptor, and run TestUtils::Flush(), I put this function as public function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about ::opencensus::stats::StatsExporter::GetViewData().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the standard trick for accessing private functions in tests is to mark them protected in the implementation class, and add a subclass in the test that makes them public. This can be made to work on non-virtual and static functions as well.


// View Descriptor accessors. If the view descriptor variable is not
// initialized, these methods will initialize the variable.
static const ::opencensus::stats::ViewDescriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about ::opencensus::stats::StatsExporter::GetViewData().

.set_measure(measure_name)
.set_aggregation(opencensus::stats::Aggregation::Count());
// remove view and flush the result before the actual test.
::opencensus::stats::StatsExporter::RemoveView(view_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we clean up in line 57, why do we also need to clean up here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the test self-contained, you need to avoid other views with view_name registered. In current unittest there is only one view called test_view, so it shouldn't be an issue; however if other view registered the view_name already, the data of the view will be aggregated on top of the original view.

The view on line 57 is to be a good citizen to clean up the view, so other test cases won't be affected by this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we make the view name unique to this test, then (e.g., by using __FILE__ and __LINE__)? It seems weird that the first thing we do is unregister a newly-created view.


auth.GetAuthHeaderValue();

// Flush records to propagate to views.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment belongs on the next statement, doesn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush records to propagate to the view we first initialize in line 148 - this comment explains the next two lines.

"test_view_1 1.000000 [0-9]*\n"));

// clean up the view which we registered for export.
::opencensus::stats::StatsExporter::RemoveView(view_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the Finally class from oauth2.cc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but won't do it because it's in another .cc file - in order to use it, we need to refactor, but I don't think it's good idea to add much complexity to this already complex PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather pay the cost of 10 lines of code duplication for a proper RAII pattern than risk the flakiness resulting from forgetting to clean this up in the future.

::opencensus::stats::StatsExporter::RemoveView(::google::Metrics::kGceApiRequestErrors);
::opencensus::stats::testing::TestUtils::Flush();

::google::Metrics::RecordGceApiRequestErrors(1, "test_kind");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? If you've removed the view from StatsExporter and never re-registered it, how are those values getting exported into SerializeMetricsToPrometheusTextFormat?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand - I think you suggested that when we register the measurement internally, we also register the view for it?

For test, to make this testcase not affected by other test, we need to remove the view first (first 2 lines) and let the logic inside RecordGceApiRequestErrorsregister the view and record the measurement, so later the value is exported by SerializeMetricsToPrometheusTextFormat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did suggest that, but the fact that we remove the view and expect it to be re-registered threw me off a bit in this test. A comment to remind readers of this setup would be helpful. In fact, because the result of GceApiRequestErrorsInitialize is cached, would the view even be re-registered at all?

As for having to unregister the view in the first place, this strikes me as pretty fragile, and, as I said in another comment, gives me no confidence that we're testing what we're using in production. I can see a couple of ways of addressing this, the most sensible being to retrieve the value for this metric using ::opencensus::stats::StatsExporter::GetViewData()[1] before calling the method under test and do the same after calling the method under test, and check that the value changed in the expected way. We could then change this specific test to make sure that if we touch the GceApiRequestErrors counter, it appears in the prometheus output with some value. Does this make sense?

[1] I wish opencensus-cpp provided a way to retrieve the counter value by view name directly from the StatsExporter.


// Remove existing view on kGceApiRequestErrors to avoid
// other tests affecting the result.
// Flush to propagate existing records to views, including records from other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment belong on the next statement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it cannot be grouped into a section of comments?

TEST(SerializeToPrometheusTextTest, SingleOpencensusView) {
const char measure_name[] = "test_measure";
const char view_name[] = "test_view";
const ::opencensus::stats::MeasureInt64 test_measure =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed function could be reused verbatim to implement GceApiRequestErrors (instead of GceApiRequestErrorsInitialize)... Basically, I'm concerned about the amount of setup these tests are doing, and it seems like this is fragile in that it could easily diverge from the setup in production code. The closer we can get this to production code, the more confidence we would have that we're testing what we're using.

"test_view_1 1.000000 [0-9]*\n"));

// clean up the view which we registered for export.
::opencensus::stats::StatsExporter::RemoveView(view_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather pay the cost of 10 lines of code duplication for a proper RAII pattern than risk the flakiness resulting from forgetting to clean this up in the future.

.set_measure(measure_name)
.set_aggregation(opencensus::stats::Aggregation::Count());
// remove view and flush the result before the actual test.
::opencensus::stats::StatsExporter::RemoveView(view_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we make the view name unique to this test, then (e.g., by using __FILE__ and __LINE__)? It seems weird that the first thing we do is unregister a newly-created view.

::opencensus::stats::StatsExporter::RemoveView(::google::Metrics::kGceApiRequestErrors);
::opencensus::stats::testing::TestUtils::Flush();

::google::Metrics::RecordGceApiRequestErrors(1, "test_kind");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did suggest that, but the fact that we remove the view and expect it to be re-registered threw me off a bit in this test. A comment to remind readers of this setup would be helpful. In fact, because the result of GceApiRequestErrorsInitialize is cached, would the view even be re-registered at all?

As for having to unregister the view in the first place, this strikes me as pretty fragile, and, as I said in another comment, gives me no confidence that we're testing what we're using in production. I can see a couple of ways of addressing this, the most sensible being to retrieve the value for this metric using ::opencensus::stats::StatsExporter::GetViewData()[1] before calling the method under test and do the same after calling the method under test, and check that the value changed in the expected way. We could then change this specific test to make sure that if we touch the GceApiRequestErrors counter, it appears in the prometheus output with some value. Does this make sense?

[1] I wish opencensus-cpp provided a way to retrieve the counter value by view name directly from the StatsExporter.

std::string(test_info_->name()) + "_creds.json",
"{\"client_email\":\"user@example.com\",\"private_key\":\"some_key\"}");
Configuration config(std::istringstream(
"CredentialsFile: '" + credentials_file.FullPath().native() + "'\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this be a 2-space indent?

// affecting the result, then flush to propagate existing records
// to views, including records from other tests.
::opencensus::stats::StatsExporter::RemoveView(
::google::Metrics::kGceApiRequestErrors);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're inside the google namespace here. Just Metrics::kGceApiRequestErrors should do.

// StatsExporter::AddView always add a new view and destroy the old one,
// which is not suitable for our
// Metrics::GceApiRequestErrorsCumulativeViewDescriptor().
::google::Metrics::GceApiRequestErrorsCumulativeViewDescriptor().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're inside the google namespace here. Just Metrics::GceApiRequestErrorsCumulativeViewDescriptor() should do.

::testing::Contains(::testing::Pair(
Metrics::GceApiRequestErrorsCumulativeViewDescriptor(),
::testing::Property(
&::opencensus::stats::ViewData::int_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull this out into Metrics::GetInt64Data(const std::string& name)? That way this test could become:

  EXPECT_THAT(Metrics::GetInt64Data(Metrics::kGceApiRequestErrors),
              ::testing::UnorderedElementsAre(
                ::testing::Pair(::testing::ElementsAre("oauth2"), 1)));


// View Descriptor accessors. If the view descriptor variable is not
// initialized, these methods will initialize the variable.
static const ::opencensus::stats::ViewDescriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the standard trick for accessing private functions in tests is to mark them protected in the implementation class, and add a subclass in the test that makes them public. This can be made to work on non-virtual and static functions as well.

public:
OAuth2(const Environment& environment)
: OAuth2(environment, ExpirationImpl<std::chrono::system_clock>::New()) {}
: OAuth2(environment, ExpirationImpl<std::chrono::system_clock>::New()) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we change the indentation here? Should we just revert this file altogether?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants